Making code efficient*

Muhammad Ramzan

Join Date: Jan 2015

Posts: 173
#1

Making code efficient*

10 Nov 2022, 21:47

Hi I have written a code to convert the files from excel to state and merge variables from two different files and lastly append the data sets. is there a way to make the code efficient please it is taking a lot of time

global temp_directory "C:\Users\ramzan\Downloads\Compressed\air_temp_201 4"

global precip_directory "C:\Users\ramzan\Downloads\Compressed\precip_2 017"

global years 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014

******* Import the temp files into stata format
cd "$temp_directory"
foreach dataset in $years{
clear
import excel using air_temp_`dataset'.xlsx, sheet("Sheet1") firstrow
destring Year, replace
*export excel using air_temp_`var'.xlsx , firstrow(variables) replace
rename (Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec) (temp#), addnumber
reshape long temp, i(lat longitude Year) j(month)
label variable lat "Latitude"
generate date = ym(Year,month), after(month)
format %tm date
egen id_g = group (lat longitude)

*append using air_temp_
cd "$temp_directory"
save air_temp_`dataset'.dta , replace

}

******* Import the precip files into stata format
cd $precip_directory
foreach dataset in $years {
clear

import excel using precip_`dataset'.xlsx, sheet("Sheet1") firstrow
destring Year, replace
*export excel using air_temp_`var'.xlsx , firstrow(variables) replace
rename (Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec) (precip#), addnumber
reshape long precip, i(lat longitude Year) j(month)
label variable lat "Latitude"
generate date = ym(Year,month), after(month)
egen id_g = group (lat longitude)
format %tm date
*append using air_temp_
save precip_`dataset'.dta , replace

}

******* Merge temp and precip files

foreach dataset in $years {
cd $temp_directory
use air_temp_`dataset'.dta, clear
*rename Year year
cd $precip_directory
merge 1:1 id_g month Year using precip_`dataset'
drop _merge id_g
cd $temp_directory
save air_temp_`dataset'.dta, replace

}

******* Append temp and precip files
clear
use "air_temp_1981.dta", clear /* select first file*/
foreach dataset in 1982{
append using air_temp_`dataset' /*Combine datasets together*/
sort Year month ID
save air_temp_all.dta , replace
erase air_temp_`dataset'.dta /*Erase files we're done with*/

}
gen temp_c = (temp-32)*(5/9)

use air_temp_all.dta,clear
Tags: None
Maarten Buis

Join Date: Mar 2014

Posts: 3494
#2

11 Nov 2022, 01:35

reshape can be slow in large datasets. You could look at fastreshape or sreshape on SSC (ssc desc fastreshape and ssc desc sreshape), to see if that noticeably improves speed in your case.

---------------------------------
Maarten L. Buis
University of Konstanz
Department of history and sociology
box 40
78457 Konstanz
Germany
http://www.maartenbuis.nl
---------------------------------
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30355
#3

11 Nov 2022, 09:13

Also available on SSC, and faster than official Stata's -reshape long- are -tolong- and -freshape-. And there is also -greshape- available as part of the -gtools- package at github.com/mcaceresb/stata-gtools.

That said, you are reading and writing a large number of files. That is going to be slow if the files are of any appreciable size. It will be slower still if they are located on a drive that has to be accessed over a network (although I see they are on your C: drive, so presumably that is local).

I also notice that your final loop starts with -foreach dataset in 1982{-, and that after -use-ing a 1981 file. So after generating intermediate results for 35 years, in the end you use only years 1981 and 1982. You could save a great deal of time by not bothering with the other years in the other loops. In fact, just changing the definition of global macro years to -global years 1981 1982- will reduce the entire computation by a factor of 17.5.
Comment
Muhammad Ramzan

Join Date: Jan 2015

Posts: 173
#4

12 Nov 2022, 00:59

Originally posted by Clyde Schechter View Post

Also available on SSC, and faster than official Stata's -reshape long- are -tolong- and -freshape-. And there is also -greshape- available as part of the -gtools- package at github.com/mcaceresb/stata-gtools.

That said, you are reading and writing a large number of files. That is going to be slow if the files are of any appreciable size. It will be slower still if they are located on a drive that has to be accessed over a network (although I see they are on your C: drive, so presumably that is local).

I also notice that your final loop starts with -foreach dataset in 1982{-, and that after -use-ing a 1981 file. So after generating intermediate results for 35 years, in the end you use only years 1981 and 1982. You could save a great deal of time by not bothering with the other years in the other loops. In fact, just changing the definition of global macro years to -global years 1981 1982- will reduce the entire computation by a factor of 17.5.

Thanks, Clyde Schechter

******* Append temp and precip files
clear
use "air_temp_1981.dta", clear /* select first file*/
foreach dataset in 1982{
append using air_temp_`dataset' /*Combine datasets together*/
sort Year month ID
save air_temp_all.dta , replace
erase air_temp_`dataset'.dta /*Erase files we're done with*/

}

actually, this was only for testing and practice

I have to append it for all of the years defined in the global variable.

I have a question is there a way to define THE first value in $years and do the operation on all the values in $years expect the first value?

Thanks
Comment
Hemanshu Kumar

Join Date: Mar 2015

Posts: 1548
#5

12 Nov 2022, 01:42

Muhammad Ramzan you seem to need all years from 1980-2014. It might be easier to directly specify a numlist in the loop instead of the global. Also, if your question in #4 is because of the append command, you can instead just initialise an empty file and then append everything. So your code would look something like this:

Code:

clear save air_temp_all, replace emptyok foreach dataset of numlist 1980/2014 { dis "Appending dataset for the year `dataset'..." append using air_temp_`dataset' /*Combine datasets together*/ sort Year month ID save air_temp_all , replace erase air_temp_`dataset'.dta /*Erase files we're done with*/ }

Last edited by Hemanshu Kumar; 12 Nov 2022, 01:46.
2 likes
Comment
Muhammad Ramzan

Join Date: Jan 2015

Posts: 173
#6

12 Nov 2022, 19:17

Originally posted by Hemanshu Kumar View Post

Muhammad Ramzan you seem to need all years from 1980-2014. It might be easier to directly specify a numlist in the loop instead of the global. Also, if your question in #4 is because of the append command, you can instead just initialise an empty file and then append everything. So your code would look something like this:

Code:

clear save air_temp_all, replace emptyok foreach dataset of numlist 1980/2014 { dis "Appending dataset for the year `dataset'..." append using air_temp_`dataset' /*Combine datasets together*/ sort Year month ID save air_temp_all , replace erase air_temp_`dataset'.dta /*Erase files we're done with*/ }

Thanks @Hemanshu Kumar

I have a question how I can change the file name from
save air_temp_all , replace to
save air_temp_1980_2004.dta , replace air_temp_firstyear from numlist _ last value form numlist

Thanks
Comment
Muhammad Ramzan

Join Date: Jan 2015

Posts: 173
#7

12 Nov 2022, 19:23

and will doing qui use and qui save will speed up the calculations or not

thanks
Comment

William Lisowski

Join Date: Dec 2014
Posts: 10150

12 Nov 2022, 19:35

Perhaps

Code:

local fyr 1980
local lyr 2014
clear
save air_temp_`fyr'_`lyr', replace emptyok

foreach dataset of numlist `fyr'/`lyr' {
    dis "Appending dataset for the year `dataset'..."
    append using air_temp_`dataset' /*Combine datasets together*/
    sort Year month ID
    save air_temp_`fyr'_`lyr , replace
    erase air_temp_`dataset'.dta /*Erase files we're done with*/
}

and will doing qui use and qui save will speed up the calculations or not

No, the quietly prefix in general will not speed up the processing significantly, but be sure you have

Code:

set more off

so Stata doesn't pause each time the Results window is full.

Comment

Muhammad Ramzan

Join Date: Jan 2015

Posts: 173
#9

12 Nov 2022, 19:59

Originally posted by William Lisowski View Post

Perhaps

Code:

local fyr 1980 local lyr 2014 clear save air_temp_`fyr'_`lyr', replace emptyok foreach dataset of numlist `fyr'/`lyr' { dis "Appending dataset for the year `dataset'..." append using air_temp_`dataset' /*Combine datasets together*/ sort Year month ID save air_temp_`fyr'_`lyr , replace erase air_temp_`dataset'.dta /*Erase files we're done with*/ }

No, the quietly prefix in general will not speed up the processing significantly, but be sure you have

Code:

set more off

so Stata doesn't pause each time the Results window is full.

. foreach dataset of numlist `fyr'/`lyr' {
2. dis "Appending dataset for the year `dataset'..."
3. append using air_temp_`dataset' /*Combine datasets together*/
4. sort Year month ID
5. save air_temp_`fyr'_`lyr , replace
6. erase air_temp_`dataset'.dta /*Erase files we're done with*/
7. }
Appending dataset for the year 2013...
invalid '`'

Thanks I am getting the following error message

please
Comment
William Lisowski

Join Date: Dec 2014

Posts: 10150
#10

12 Nov 2022, 20:08

Please compare the two save commands and add the missing character to the second one.
Comment
Muhammad Ramzan

Join Date: Jan 2015

Posts: 173
#11

12 Nov 2022, 20:38

Originally posted by William Lisowski View Post

Please compare the two save commands and add the missing character to the second one.

Thanks
Comment

Announcement

Making code efficient*

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment